Bayesian Neural Word Embedding
نویسنده
چکیده
Recently, several works in the domain of natural language processing presented successful methods for word embedding. Among them, the Skip-Gram (SG) with negative sampling, known also as word2vec, advanced the stateof-the-art of various linguistics tasks. In this paper, we propose a scalable Bayesian neural word embedding algorithm that can be beneficial to general item similarity tasks as well. The algorithm relies on a Variational Bayes solution for the SG objective and a detailed step by step description of the algorithm is provided. We present experimental results that demonstrate the performance of the proposed algorithm and show it is competitive with the original SG method.
منابع مشابه
Word Semantic Representations using Bayesian Probabilistic Tensor Factorization
Many forms of word relatedness have been developed, providing different perspectives on word similarity. We introduce a Bayesian probabilistic tensor factorization model for synthesizing a single word vector representation and per-perspective linear transformations from any number of word similarity matrices. The resulting word vectors, when combined with the per-perspective linear transformati...
متن کاملWord Clustering Using Word Embedding Generated by Neural Net-based Skip Gram
This paper proposes word clustering using word embedding. We used a neural net-based continuous skip-gram method for generating word embedding in continuous space. The proposed word clustering method represents each word in the vector space using a neural network. The K-means clustering method partitions word embedding into predetermined K-word
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملSmart Data: Where the Big Data Meets the Semantics
Big data technology is designed to address the challenges of the three Vs of big data, including volume (massive amount of data), variety (a range of data types and sources), and velocity (speed of data in and out). Big data is often captured without a specific purpose, leading to most of it being task-irrelevant data. The most important feature of data is neither the volume nor the other Vs, b...
متن کاملEnhancing Translation Language Models with Word Embedding for Information Retrieval
In this paper, we explore the usage of Word Embedding semantic resources for Information Retrieval (IR) task. This embedding, produced by a shallow neural network, have been shown to catch semantic similarities between words (Mikolov et al., 2013). Hence, our goal is to enhance IR Language Models by addressing the term mismatch problem. To do so, we applied the model presented in the paper Inte...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017